Solving for Best Responses and Equilibria in Extensive-Form Games with Reinforcement Learning Methods

نویسندگان

  • Amy Greenwald
  • Jiacui Li
  • Eric Sodomka
چکیده

We present a framework to solve for best responses and equilibria in an extensive-form game (EFG) of imperfect information by transforming the game into a set of Markov decision processes (MDPs), and then applying simulation-based reinforcement learning to those MDPs. More specifically, we first transform a turn-taking partially observable Markov game (TT-POMG) into a set (one per player) of partially observable Markov decision processes (POMDPs), and we then transform that set of POMDPs into a corresponding set of Markov decision processes (MDPs). Next, we observe that EFGs are a special case of TT-POMGs, and hence can be transformed as described. Furthermore, because each transformation preserves the strategically-relevant information of the model to which it is applied, an optimal policy in one of the ensuing MDPs corresponds to a best response in the original EFG. We then go on to prove that our reinforcement learning algorithm finds a near-optimal policy (and therefore a near-best response in the original EFG) in finite time, although the sample complexity is lower bounded by a function with an exponential dependence on the horizon. Nonetheless, we apply this algorithm iteratively to search for equilibria in an EFG. When the iterative procedure converges, the resulting MDP policies comprise an approximate Bayes-Nash equilibrium. Although this procedure is not guaranteed to converge, it frequently did in numerical experiments with sequential auctions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Solving for Best Responses in Extensive-Form Games using Reinforcement Learning Methods

We present a framework to solve for best responses in extensive-form games (EFGs) with imperfect information by transforming the games into Information-Set MDPs (ISMDPs), and then applying simulation-based reinforcement learning methods to the ISMDPs. We first show that, from the point of view of a single player, an EFG can be represented as an Information-Set POMDP (ISPOMDP) whose states corre...

متن کامل

Regularized Best Responses and Reinforcement Learning in Games

We investigate a class of reinforcement learning dynamics in which each player plays a “regularized best response” to a score vector consisting of his actions’ cumulative payoffs. Regularized best responses are single-valued regularizations of ordinary best responses obtained by maximizing the difference between a player’s expected cumulative payoff and a (strongly) convex penalty term. In cont...

متن کامل

Lecture Notes on Game Theory

1. Extensive form games with perfect information 3 1.1. Chess 3 1.2. Definition of extensive form games with perfect information 4 1.3. The ultimatum game 5 1.4. Equilibria 5 1.5. The centipede game 6 1.6. Subgames and subgame perfect equilibria 6 1.7. Backward induction, Kuhn’s Theorem and a proof of Zermelo’s Theorem 7 2. Strategic form games 10 2.1. Definition 10 2.2. Nash equilibria 10 2.3....

متن کامل

Solving extensive-form games with double-oracle methods

We investigate iterative algorithms for computing exact Nash equilibria in two-player zero-sum extensive-form games. The algorithms use an algorithmic framework of double-oracle methods. The main idea is to restrict the game by allowing the players to play only some of the strategies, and then iteratively solve this restricted game and exploit fast best-response algorithms to add additional str...

متن کامل

Joint Learning in Stochastic Games: Playing Coordination Games Within Coalitions

Despite the progress in multiagent reinforcement learning via formalisms based on stochastic games, these have difficulties coping with a high number of agents due to the combinatorial explosion in the number of joint actions. One possible way to reduce the complexity of the problem is to let agents form groups of limited size so that the number of the joint actions is reduced. This paper inves...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015